New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, RFC] doc/memos: Added RDM on high level timer API requirements and common features #12970
Conversation
I think I now requested reviews from every maintainer that contributed to either xtimer or ztimer and every reviewer of ztimer. |
If it is a draft, please keep to the naming scheme for drafts:
|
Done |
- The implementation must only disable IRQs for short periods of time | ||
- The implementation must cause as little IRQs as possible and its ISRs must | ||
be as short as possible | ||
- The implementation should provide means to execute callbacks in thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is IMO out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or should maybe go to the utility functionality...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or should maybe go to the utility functionality
Yep. It is already in the feature requirements below utilities. To me, this is an important thing to improve real time capabilities in a convenient way. (Obviously, one could just write their own handlers to call event_post()
; but I think having a utility function for that can be much easier to use.)
I'm not sure if it makes sense to have this written twice. To me this is both a feature requirement and a realtime requirement; so I just added it twice.
|
||
## Precision Requirements | ||
|
||
- Timer callbacks should never fire prematurely (+- 1 tick of the underlying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important point. Is "-1" acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that formulation, clearly yes. I think the effort needed to prevent this could be quite significant. So if no one cares about it triggering early by 1 tick enough to speak up, we should just allow this :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the effort needed to prevent this could be quite significant.
set(val) { val += 1; ... }
, or run_callback() { while(now() < target_time) {}; ...
can achieve this.
We had a discussion some time ago, where people convinced me that "at least as" semantics are the only sane ones. Don't care about one tick? Set the timeout one less. (Having written that, I realize it also works the other way. :) )
The impact of this is very different depending on us, ms, or even second ticks are used.
I'm OK with keeping -1
for now.
I updated the title to contain both the survey and the requirements analysis and also added every contributor as author (sorted by last name, alphabetically). |
I agree with @aabadie here. From my point of view, the most crucial part of this document for our next steps is the requirements we set for our high-level timer API. Would be bizarre not to put this in the title of the RDM. |
while not being to strict to rule out reasonable trade-offs e.g. between | ||
ROM/RAM requirements and latency. An *O(n)* latency is e.g. feasible to | ||
implement with a sorted linked list and therefore a reasonable lower bar. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n) is an unbound function, so this justification is inconsistent. For real-time requirements ask for O(1), I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every concrete application, there is an upper bound for the number n of software timers being active at the same time. So for every concrete application, O(n) has an upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for any finite set of numbers. So we could also go for an O(exp(n))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n) is an unbound function, so this justification is inconsistent. For real-time requirements ask for O(1), I suppose.
This is intentionally written to not be O(1), because of the reasonable trade-offs as described. We know how to do O(n) by simply using linked lists and sorting on insert, and we understand the runtime implications well. We don't know how to do this in O(1) without limiting the number of allowed timers.
We do multiplexing here, so one hardware timer (or more, but a small number) needs to be used to schedule N timeouts.
So when adding one of these timeouts (using a set(interval) function), that new timeout needs to be added into the list of already queued timers. Using linked lists, this is an O(n) operation. Using an array and binary search, this is O(logn). We need to be careful what to require here, because if we break O(n*logn) for sorting n timers, we might attract a lot of attention.
It is possible to add an allowed timers limit (that is lower than available memory, e.g., 16 or 128) and enforce this.
We'd suddenly have O(1) on paper (meeting the requirements), but wouldn't have gained anything, other than that now a multitude of functions have an error case (timer_sleep() might fail with "maximum number of timers reached") that needs to be checked everywhere, which is unreasonable.
TL;DR let's not require something we don't know how to implement. This is phrased as "yadda yadda reasonable trade-offs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complete timer management can be done in O(log(n)), which is almost constant in the number range we consider realistically. Further optimizations can apply, if algorithms account for the deadline of the next firing timer, which can be determined in constant time.
Current preliminary HiL-tests of @pokgak show a rather poor behavior of xtimer and a somewhat unclear signature of ztimer. We will publish once ready.
In any case, it is important that timer firing remains within predictable bounds independent of the number of instantiated timers - otherwise real-time gets out of control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log(25) ~ 1.39, exp(25) ~ 7.2 x 10**10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log(25) ~ 1.39, exp(25) ~ 7.2 x 10**10.
Both bounded, which is exactly what a realtime system needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
Do you want to dive into the numerical analysis of the numbers you provided or can we agree here that both numbers are constants?
A realtime system does not require the system to be fast. It does however require the system to respond within the specified deadlines. This deadline can be in the order of microsecond or in the order of years, depending on the requirements.
Furthermore, a complex O(1) algorithm, taking 10 seconds performs worse compared to a simple O(n) algorithm taking 10 ms + n * 1 ms
. when n is known to be bounded to 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a solution here, why don't we rephrase a bit and specify that it must be trivial to determine the maximum number of active timers for an application in order to ensure bounded execution time?
@MichelRottleuthner do you want to add requirements, or are you fine with this for now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a deeper look into this RDM. Its been a while since the last time I did that.
All in all, this RDM covers all features we require for our use cases. A nice piece of work!
Just a minor comment. (Sorry for kicking in this late during the polishing process!)
ping @maribu! |
This PR started out to be the RDM on timers, so we could base any further work on it with confidence. But currently it is a collection of requirements and some survey on features other timer implementations provide. It does not go into specific design questions for a high-level timer implementation. At the same time, ztimer got merged and provides many solutions to problems we have with xtimer. But migrating to ztimer as the default timer is a bit stalled for lack of an RDM approved design, which this RDM was supposed to give, but doesn't. I think we all agree that doing comprehensive research on "how to do high-level timers right[tm]" is a lot of work. Many of us think that ztimer is a step in the right direction, and we should migrate. AFAIK noone thinks xtimer can be turned into whatever we need without substantial rework. And xtimer effectively blocks RIOT from sleeping, which should be a fix first huuuuge bug for an OS supposed to run on low-power devices. Can we agree on most of the requirements, and concentrate on that for this PR? We could then move forward with ztimer and check if it meets this PR's requirements, in a follow-up RDM. (would it make sense to even drop the survey part here and save it for the more extensive research RDM?) |
I don't think the lack of RDM stops ztimer from being used by default. |
Even though I believe a rigorous and exhaustive design document would be essential for reworking the timer API, I don't see any of this here. It rather appears as an - somewhat related - informational document about "some thoughts about timers without rigorous justification". This certainly cannot serve as a foundation for redesign. The question is rather, whether this is helpful and needed or whether this can simply be discarded. |
Let me recapitulate the history of this:
An particularly interesting observation is that the very persons that were the first to call out for an RDM as requirement for any progress in regard to the timer API are seemingly the last persons to contribute to it. I'm willing to keep pushing this RDM to some sort of conclusion. But please keep me out of the loop for any follow up. |
Is that really the case? If ztimer works and is a drop in replacement for xtimer, nobody will care about the RDM anymore. What really blocks transition to |
From the discussion in the devel mailing list I pretty much got the impression. Here are some quotes of the key arguments leading to this RDM:
@MichelRottleuthner, 10. 12. 2019, full email for context
@MichelRottleuthner, 11. 12. 2019, full email for context
@MichelRottleuthner, 13. 12. 2019, full email for context @tcschmidt did not explicitly ask for an RDM, but for a "problem statement and design document" and wanted "a clear and falsifiable problem statement" prior to even discussing the adoption of To my best understanding, both @MichelRottleuthner and @tcschmidt saw completion of an RDM (or in case of @tcschmidt an RDM-like document) as requirement for adoption of
@maribu, 16. 12. 2019, full email for context I think it is fair to say that nobody's expectations towards this RDM were met. And honestly, I see no indication that a second try should work out better. I'm not sure if what @benpicco was suggesting to rather work on improving Btw: #15052 would be a huge step towards |
several people asked for an RDM to have a base for an informed discussion. it was not about "Having an RDM for the sake of having an RDM". quite often sound solutions need informed discussions, to prevent trial and error. ... as someone, i think it was Kaspar, said in the timer breakout session in Helsinki: why should we be successful this time? we know from the past that code that is not enough. |
@maribu thanks a lot for the work you put in this. Maybe we are falsely lead to see this glass half empty.
Let's not lose sight of the above, which can be considered useful progress. |
As this seems to be the place to put general ideas about the high-level timer API, here is another one: I think the high-level timer could be simplified if it configured the low-level timer in down-counting mode.
instead, with down-counting timers we only have to
Now this simplifies the What do you think? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
Contribution description
Early draft for an RDM on a high level timer API.
Testing procedure
n/a
Issues/PRs references
None